Robust classification of heart valve sound based on adaptive EMD and feature fusion

Cardiovascular disease (CVD) is considered one of the leading causes of death worldwide. In recent years, this research area has attracted researchers’ attention to investigate heart sounds to diagnose the disease. To effectively distinguish heart valve defects from normal heart sounds, adaptive empirical mode decomposition (EMD) and feature fusion techniques were used to analyze the classification of heart sounds. Based on the correlation coefficient and Root Mean Square Error (RMSE) method, adaptive EMD was proposed under the condition of screening the intrinsic mode function (IMF) components. Adaptive thresholds based on Hausdorff Distance were used to choose the IMF components used for reconstruction. The multidimensional features extracted from the reconstructed signal were ranked and selected. The features of waveform transformation, energy and heart sound signal can indicate the state of heart activity corresponding to various heart sounds. Here, a set of ordinary features were extracted from the time, frequency and nonlinear domains. To extract more compelling features and achieve better classification results, another four cardiac reserve time features were fused. The fusion features were sorted using six different feature selection algorithms. Three classifiers, random forest, decision tree, and K-nearest neighbor, were trained on open source and our databases. Compared to the previous work, our extensive experimental evaluations show that the proposed method can achieve the best results and have the highest accuracy of 99.3% (1.9% improvement in classification accuracy). The excellent results verified the robustness and effectiveness of the fusion features and proposed method.


Introduction
Heart sound is a weak biological signal produced by the systolic and diastolic motion of the human heart. It is also a vibration signal with nonlinear and non-stationary characteristics. Rheumatic heart disease refers to the effect of rheumatic fever on the heart valve, which results in heart valve disease. Its symptoms are stenosis or insufficiency of the mitral valve, tricuspid valve, and aortic valve. With the aging of the population, senile valvular disease, coronary heart disease, and valvular disease caused by myocardial infarction are becoming more and more common [1]. For a long time, the feature extraction of heart sound signals has been a research hotspot in the biomedical field. Existing research has put forward a variety of heart sound feature extraction from the perspectives of time domain, frequency domain, and timefrequency domain. Among them, wavelet analysis is widely used because of its excellent ability to represent local signal information in time and frequency domains. Used for time-frequency feature extraction of the heart sound signals, heart sound analysis based on S transform is an extension of wavelet transform and STFT, which overcomes the deficiencies of the latter two. However, most of these methods are based on the concept of linear time-varying or timeinvariant. Because of the nonlinear and non-stationary characteristics of the heart sound signal, the linear analysis method is bound to ignore some important information inside the signals [2]. Empirical Mode Decomposition (EMD) is a new method for non-stationary processing signals proposed by Huang [3], a Chinese scientist of NASA in Hilbert-Huang Transform essential part. Zhang et al. used an empirical mode decomposition technique (EMD) to remove wall components from mixed signals [4]. The new method improves the performance of effectively and objectively removing wall components from composite signals. The time-frequency analysis method based on EMD is suitable for analyzing nonlinear and non-stationary signals and for the analysis of linear and stationary signals, which can adaptively decompose any signal into multiple intrinsic mode functions (IMF). Each IMF component contains the local characteristics of the original signal in different time scales. Analysis of IMF components can more accurately reflect the relevant information of the detailed characteristics of the original signal. Therefore, using EMD to decompose the complex heart sound signal and then extracting the characteristic information of the signal from the decomposed IMF components can reflect the intrinsic essence of the heart sound.
Feature extraction and analysis of heart sound signals is a significant part of establishing a cardiovascular disease diagnosis system [5][6][7]. Different features can reflect the state of heart function from various aspects. Therefore, statistical analysis of heart sound signals can determine the difference between heart valve defects and normal heart sounds, which can be used to discriminate different sound signals. Therefore, the corresponding ordinary feature sets are extracted from the time domain [8], frequency domain [9] and nonlinear space [10,11]. In addition to extracting the above features, the characteristics of four cardiac reserve times (T 1 , T 2 , T 11 and T 12 ) were also integrated in this paper [12]. Feature selection methods in machine learning play an important role in biomedical data analysis [13]. Feature selection techniques can be roughly divided into three types: filter method, embedded method, and wrapper method [14]. Filter methods can be divided into two main groups, namely single feature evaluation and subset feature evaluation [15], regardless of classifier design. Wrappers and embedded methods interact with classifiers to achieve feature selection [16].
This paper proposes a feature reconstruction algorithm and feature fusion method based on adaptive EMD to classify heart valve diseases. It can effectively distinguish heart valve defects from normal heart sounds by selecting an essential subset of features. The main contributions are outlined as follows.
1. This paper improves an adaptive reconstruction method based on Hausdorff distance.
After EMD transformation, the Hausdorff distance (HD) between IMFs and the original heart sound signal was calculated. Then, according to the adaptive threshold based on Hausdorff distance, the appropriate IMF components were selected to reconstruct the heart sound signal. The proposed method has a better noise reduction effect, and the reconstructed heart sound signal has more obvious feature information.

2.
A feature fusion method is proposed, which extracts not only the time domain, frequency domain, and nonlinear features but also fuses four cardiac reserve times features. The proposed feature fusion method can improve the effectiveness and accuracy of heart sound classification.

Algorithm design of preprocessing
Fig 1 shows the overall method block diagram, which contains four main parts: preprocessing of original heart sound, feature extraction, feature screening, and classification. Heart sound is a kind of weak physiological signal, which will inevitably produce large or small noise due to various types of interference in the acquisition process. Such noise may cover up the original characteristics of the signal and affect the subsequent analysis of different heart sound signal types. Noise mainly includes environmental noise, power frequency noise, friction sound of skin contacts and collection equipment, the instrument's interference, etc. Therefore, to maximize the retention of valuable signals, it is necessary to preprocess the original signal.

A. EMD
The Hilbert-Huang transform includes Huang transform and Hilbert spectrum analysis. Huang transform is also called Empirical Mode Decomposition (EMD) [10,11]. EMD, as a nonlinear and non-stationary signal analysis method, can decompose the heart sound signal into several intrinsic mode functions, and each IMF component carries local characteristics corresponding to the signal. Compared with wavelet transform, EMD decomposes the signal by selecting the wavelet function in advance and adaptively decomposes each IMF component according to frequency from high to low. Each IMFs contains the local characteristics of different time scales of the original signal. The IMF component must satisfy two restrictions: In the whole sequence, the number of extreme points a and zero-crossing points b meet the condition of |a-b| � 1; For any point, the mean value of the upper and lower envelopes composed of extreme local issues must be 0. For a time-series signal s(t), the principle of EMD is as follows [31]: 1. Determine the local maximum and minimum points of the original heart sound signal s(t), and fit the upper and lower envelopes e 1 (t) and e 2 (t), as shown in the yellow and green curves in Fig 2. 2. Obtain the mean curve m(t) of e 1 (t) and e 2 (t) as shown in Formula (1), as shown in the red curve in Fig 2.
Calculate the mean curve m(t) of the envelopes, and uses(t) minus m(t) to get: If h 1 (t) does not satisfy any of the above conditions of IMF, then regard h 1 (t) as s(t), and repeat the above steps until the component h k (t) that satisfies the restrictions is obtained, and record it as the first IMF component c 1 (t).
3. Calculate the component r 1 (t) = s(t)-c 1 (t), use r1(t) as the original signal, and repeat the above steps until the end of the decomposition, and finally obtain an IMF component and a remainder r(t). So far, the original signal can also be expressed as: Fig 3 shows the IMF component signal's time and frequency domain waveforms obtained from an abnormal heart sound valve defect signal (Aortic stenosis, AS) after EMD. The number of IMF components obtained by decomposition from different signals is different. The figure only shows the first ten components (c 1 -c 10 ) and the remainder r of the empirical mode decomposition. Observing from Fig 3(B), after EMD processing, the original heart sound signal containing multiple frequency bands is decomposed into multi-layer IMF components in ascending order of frequency. The frequency areas of the heart sound signal are mainly concentrated in the low-frequency part.

B. IMF selection and reconstruction
Since the heart sound signal collection will inevitably generate noise due to various interferences, the original characteristics of the heart sound signal will be masked by the noise. It can be seen from Fig 3 that the essential information of the original signal is often concentrated in a few IMF components. And the interference caused by noise to the signal characteristics can be effectively reduced by screening suitable IMF component signals. We choose two evaluation indicators: Correlation Coefficient [32,33], and Root Mean Square Error [34].
1) Correlation and root mean squared error. The correlation coefficient formula based on EMD is as follows: In Formula (4), corr i is the correlation coefficient between the original heart sound signal s (j) and the IMF component signal, j is the j-th sample of the signal, and c i (j) is the j-th of the ith IMF component obtained by EMD. Where: i = 1, 2,. . ., L. The larger the corr, the higher the correlation.
The RMSE formula based on EMD is as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi In Formula (5), RMSE i is the Root Mean Square Error between the original heart sound signal s(j) and the IMF component signal, often used as an error measurement. The smaller the RMSE value, the higher the closeness to the original signal. According to the Corr i and RMSE i of the component signal and the original signal, the threshold λ and δ are calculated by the Formula (6). The judgment condition Formula (7) is used for adaptive threshold judgment (the correlation coefficient of the IMF component is greater than or equal to λ, and the Root Mean Square Error is smaller than δ) to filter out practical IMF components. Because the L obtained by decomposition of each signal is different, the threshold values λ and δ are different, and the number of IMF components screened is also different.
Since each IMF component signal carries noise, the noise is not correlated with the heart sound signal and has a significant deviation. This article uses an adaptive idea to select the excellent IMF components and reconstruct them to form a sub-signal for subsequent feature extraction.
2) Algorithm flow. Fig 4 shows the adaptively reconstruction. Firstly, the Corr and RMSE between each IMFs component signal and the original heart sound signal are calculated. The best component signal is selected according to the adaptive threshold selection rule. Finally, the heart sound sub-signal is reconstructed. The signal can more intensively characterize the adequate information of the original heart sound and prepare for the extraction of various characteristic parameters of the heart sound. Fig 6(A) and 6(B) respectively show the AS, MS, MR, MVP, and NHS heart sound signals, as well as the correlation coefficient size and RMSE of the first 9-layer components after ASD, VSD, TOF, and NHS heart sounds, are decomposed. Among them, the dotted lines represent the adaptive thresholds λ and δ. The IMF components lower than λ and higher than δ are the original signal interference components, represented by black bars. The IMF components higher than λ and lower than δ are the most robust signal correlated features, represented by yellow and blue bars, respectively.

C. Improved adaptive reconstruction based on Hausdorff distance
Based on the above adaptive reconstruction IMF method, this paper also finds that using the first seven layers of IMF components can optimize the results. So, calculate the Hausdorff Distance (HD) value of each IMF and the original signal, and then adaptively select the appropriate IMF component reconstruction [35].
1) Hausdorff distance. The Hausdorff Distance (HD) is the Max-Min distance between two geometric objects to measure their degree of resemblance [8]. Given two nonempty points sets A = {a 1 , a 2 ,. . ., a m } and B = {b 1 , b 2 ,. . ., b n }, the HD between A and B is formulated as where we have: where || � || is a norm distance metric, called the Euclidean distance.
2) Proposed algorithm flow. The proposed algorithm flow chart is as follows. As shown in Fig 5, this paper improves the method of Hausdorff adaptive threshold to IMF and further selects more appropriate IMF components, as shown in the following where L is the number of IMF components. Fig 5 shows the improved adaptive threshold selection method based on HD. First, after obtaining all the IMF components after EMD, calculate the HD between each layer and the original signal, and then sort the IMF components with the HD value in the first seven layers. Therefore, the appropriate IMF component can be adaptively selected by calculating the mean of the IMF components' HD from the first seven layers. Finally, the heart sound sub-signal is reconstructed.
Figs 6 and 7 show the comparison results using the two IMF components selection method. In Fig 6(A), according to Formula (7), Corr&RMSE threshold selection method is applied to the abnormal heart sound AS to select 2th-5th layers of IMF to reconstruct the signal. In of the heart sound signal before and after the EMD adaptive filtering and reconstruction. It can be seen from the red box marked that the noise in the high-frequency part of the heart sound is effectively suppressed due to the reconstruction. Therefore, the reconstruction operation can reduce the noise of the heart sound signal effectively, thus can successfully select heart sound sub-signals that are more similar to the original heart sound.

A. Feature fusion
Different features can reflect the state of heart function from various aspects. Therefore, statistical analysis of heart sound signals can obtain the difference between heart valve defect sounds and standard heart sound signals, which can be used to discriminate different heart sound signals. Existing studies have shown that the waveform transformation characteristics, energy characteristics and complexity characteristics of heart sound signals can reflect the corresponding cardiac activity state of different heart sound samples. Therefore, this section extracts complementary feature sets from the time domain, frequency domain and nonlinear domain and combines four cardiac reserve time features. Then the analysis can be used for subsequent feature screening and classification recognition.   3) Nonlinear domain features. This kind of information describes the state of movement and the existence of objective things. Information entropy illustrates the complexity of data from the perspective of information theory. It also expands and analyzes from the standpoint of nonlinearity to obtain corresponding load-type indicators that describe signal data. From the perspective of nonlinearity, the corresponding load-type index describing the signal data can be obtained. There are four kinds of nonlinear characteristics: approximate entropy [36], Sample Entropy, Multi-Scale Permutation Entropy, and Exponential Entropy.
As shown in Table 1 below, the feature set extracted in our paper includes the above 25 types of time-domain features, 11 types of frequency-domain features, and four types of nonlinear characteristics. After feature extraction, 40 types of features are labelled to facilitate subsequent feature screening experiments.

4) Cardiac reserve time features.
According to the established single-degree-of-freedom vibration model, auscultation of heart sound is compared to the duration of low-frequency sound or sound pressure captured by the tympanic membrane to extract the corresponding time-limited features. T 1 is the time limit of S 1 , T 2 is the time limit of S 2 , T 12 is the time limit of S 1 to S 2 in the same cardiac cycle, and T 11 is the time limit of one cardiac cycle, indicating the time interval from S 1 to the start of the next adjacent cardiac cycle S 1 .

B. Feature selection
The multiple features may include related, irrelevant, and redundant features. Therefore, selecting features that are beneficial to learning classification from all features is necessary. Feature selection methods can be divided into three types: • Filter: Filtering method, scoring each feature according to divergence or correlation, setting threshold or the number of thresholds to be selected, and selecting features.
• Wrapper: according to the objective function (usually the prediction effect score), select several features at a time, or exclude several features.
• Embedded: The embedding method first uses machine learning algorithms and models for training, gets each feature's weight coefficient and selects the feature from large to small according to the coefficient. Similar to the Filter method, but through training to determine the pros and cons of features.
The significant goals of feature selection are increasing the accuracy, finding the minimal effective feature subset, and increasing the performance of evaluations. So, this paper selects six feature screening and sorting algorithms [37], among which the filter methods include mRMR, KCCAmRMR, MIC, and QPFS, the wrapper method is RFECV, and the embedded method has a tree-based model. 1) mRMR. The Minimum Redundancy-Maximum Relevance (mRMR) algorithm is a filtering feature selection method [38,39]. This method can balance correlation and redundancy in different ways and uses mutual information as a calculation criterion to measure the redundancy between features and the relationship between characteristics and class variables. The correlation between features is selected by maximizing the correlation between characteristics and class variables and minimizing the redundancy between features.
2) KCCAmRMR. An improved algorithm, called Kernel Canonical Correlation Analysis based on mRMR (KCCAmRMR) [40], was also developed, in which irrelevant redundancy is filtered out by using an additional kernel canonical correlation analysis. Thus, only the relevant redundancy is considered in the subsequent mRMR procedure. The feature selection criterion of our KCCAmRMR method has two terms as in mRMR: relevance and redundancy.

3) QPFS.
Quadratic programming feature selection (QPFS) is a feature ranking algorithm that uses the information theory as the similarity measure [41]. Also, it applies an optimization solution to estimate the quality of a given dataset's features. The QPFS assigns a weight to each feature such that the more critical features will have more significant values. As a final result, the features are sorted based on decreasing consequences. Then by applying a threshold value, the top features will be selected as the last selected features.

23
Second Quartile (Q 2 ) The 50th percentile of the data in ascending order

T 12
The cardiac reserve time from S 1 to S 2 in a cycle 24 Third Quartile (Q 3 ) The 75th percentile of the data in ascending order

4) MIC.
MIC can quantify the correlation between continuous and qualitative variables and calculate the correlation between a constant feature and qualitative target variables [42,43]. The larger the MIC value, the stronger the recognition ability of the corresponding element. This algorithm calculates the correlation between each dimensional feature and the heart sound sample label, and essential features can be selected.

5) Tree-based model.
Tree-based prediction model can be used to calculate the importance of features, so they can be used to remove irrelevant features. 6) RFECV. The RFECV method is divided into two parts [44]. Recursive feature elimination is used to evaluate the importance of features. The other is Cross-Validation (CV), which selects the best number of segments through CV after feature evaluation. Feature.

A. Data source
The heart sound data used in our experiment is from open source ([Online] Available: https:// github.com/yaseen21khan/) [25] and data collected in our laboratory. Heart sound signals from heart valve defects were collected clinically. All recordings have been resampled to  Tables 2 and 3 show the data source. The dataset was split into training data (80%) and testing data (20%).

B. Feature screening model comparison
This paper extracts 40 features from the heart sound reconstructed by adaptive EMD using fusion features from time, frequency, nonlinear domain and cardiac reserve time features. Each type of heart sound signal contains a 40 fusion feature data set. The open source dataset includes five types of heart sounds. To validate the proposed feature fusion technique under various EMD methods, feature sets with or without the four cardiac reserve time features were tested. Six feature screening methods are used to rank the particular importance. This performs feature reduction to prepare for the subsequent input of the classifier.
Similarly, the 412 samples collected in the laboratory were subjected to the same experiment to build the 412×40 feature set. After removing the four cardiac reserve time features, the size of the remain feature set is 412×36.

C. Classification accuracy
From the heart sound signal in the open source database and our lab, 36/40 features ranked by mRMR, KCCAmRMR, QPFS, MIC, Tree, and RFECV were incrementally fed into the RF

PLOS ONE
classifier [45,46]. In Figs 9 and 10, each graph is composed of six polylines, representing a signal composition method. The composition method is as follows: 1. The original heart sound is directly extracted without the EMD reconstruction process.

Use Corr & RMSE to select EMD reconstructed IMF components and then extract 36
features.
3. Use the original HD to select the EMD reconstructed IMF component and perform 36 feature extraction.

Use
Corr & RMSE to select EMD reconstructed IMF components and extract 40 features.
5. After EMD, the HD threshold of the first seven layers of IMF components is adaptively selected and reconstructed, and then extract 40 features.
6. Use the original HD to select and reconstruct the IMF components. Then, 36 feature extractions are performed. The six feature screening and sorting algorithms are used. Finally, input into the classifier.
The comparison of the six feature selection methods used in the classification experiment verifies the data mining effect of the feature data set after combining the six feature extraction data. The main idea in this experimental part is: First, select heart sound preprocessing and feature reconstruction by different methods. Secondly, 36 features and 40 features are used to compare the data set in the absence of 4 heart sound signal-specific cardiac reserve time features under the six feature screening algorithms of mRMR, KCCAmRMR, QPFS, MIC, treebased model, and RFECV. Multi-dimensional elements are sorted to obtain a data set in which the information features of each data set are sorted from high to low. Finally, the highestranked feature is gradually input into the classifier to obtain the classification results. The collected data has more noise interference than the open source data. It can be observed that, among the six signal reconstruction methods, adaptive HD has obviously higher classification accuracy than other reconstruction methods among all six feature selection methods. By comparing the blue curve and the black curve, it can find that the peak of the average accuracy curve from the extracted 36 features (without T 1 , T 2 , T 11 , and T 12 cardiac reserve time features) is lower than that of the extracted 40 features (including T 1 , T 2 , T 11 , and T 12 ). Therefore, the results of the feature fusion are better than that of the ordinary features.
Tables 4-9 compare the six different methods with six feature selection algorithms on the open-source and our laboratory datasets. The feature fusion technique is also tested by comparing 36 features (without T 1 , T 2 , T 11 , and T 12 cardiac reserve time features) and 40 features (including T 1 , T 2 , T 11 , and T 12 ). As shown in Tables 4-9, the method, No EMD, gets the worst  Table 8. In the laboratory dataset, the classification accuracy reached 76.21%, and the selected features decreased from 40 to 12 in Table 6.
The computer used to run the proposed method is Legion R9000P2021H, and the processor is AMD Ryzen 7. The operating system is Windows 10. The time consumption of classification of heart valve sound for every sample in the open source dataset using EMD+Adaptive HD is about 10s.  The heart sound classification accuracy on the laboratory database was 77.18% when the 12 features were selected. Table 11 shows the comparison with previous research on the open source dataset Yaseen. As can be seen from Table 11 that Yaseen got an accuracy of 97.9% by SVM in 2018 [25]. In 2020, Baghel achieved an accuracy of 98.6% using CNN [47]. In 2020, Oh attained an accuracy of 98.2% by WaveNet [48]. Our proposed method can achieve the best accuracy compared to the above algorithms.

Conclusion
The heart sounds with valve defects were effectively distinguished from the normal heart sounds. This paper used adaptive Empirical Mode Decomposition (EMD) and feature fusion strategy to classify heart sounds. Several feature selection methods and classifiers were chosen to compare. Experimental tests in two databases validated the effectiveness of the proposed IMF reconstruction method with the adaptive Hausdorff Distance thresholds. Our proposed feature fusion technique with 40 features, including ordinary features and cardiac reserve time features, can achieve robust and excellent results. The experimental results show our proposed methods, adaptive EMD and feature fusion, are of great value to further realize the clinical auxiliary diagnosis of heart disease. Although there is a good performance on the public dataset, the number and type of samples collected in our laboratory are insufficient. To achieve better experimental results, more samples need to be collected to verify the robustness and effectiveness of our proposed algorithm.
Some areas can still be optimized and improved to be studied in the future, such as obtaining more heart disease datasets, using ML and AI techniques to analyze heart sounds and improving the accuracy of heart sound classification, reducing the time-consuming cost of the algorithm.